PATREC[4,KMC]4 - www.SailDart.org

perm filename PATREC[4,KMC]4 blob sn#086952 filedate 1974-02-12 generic text, type T, neo UTF8
00100	AN ALGORITHM WHICH RECOGNIZES NATURAL LANGUAGE
00200	          DIALOGUE EXPRESSIONS
00300	
00400	
00500	
00600		COLBY AND PARKISON
00700	
00800	OUTLINE
00900		INTRODUCTORY -Discussion of language as code, other approaches
01000		              sentence versus word dictionary using projection
01100	                      rules to yield an interpretation from word definitions.
01200		              experience with old Parry.
01300		PROBLEMS -dialogue problems and methods. Constraints. Special cases.
01400		Preprocessing- dict words only
01500		              translations
01600	
01700	                     (ROGER- WHAT HAPPENED TO FILLERS?)
01800		              contractions
01900			      expansions
02000			      synonyms
02100			      negation
02200		Segmenting - prepositions, wh-words, meta-verbs
02300			    give list
02400		Matching - simple and compound patterns
02500			  association with semantic functions
02600			  first coarsening - drop fillers- give list
02700			  second coarsening - drop one word at a time
02800			  dangers of insertion and restoration
02900			Recycle condition- sometimes a pattern containing pronouns
03000			is matched, like "DO YOU AVOID THEM".  If THEM could be
03100			a number of different things and Parry's answer depends on
03200			which one it is, then the current value of the anaphora,
03300			THEM, is substituted for THEM and the resulting pattern
03400			is looked up.  Hopefully, this will produce a match to a
03500			more specific pattern, like "DO YOU AVOID MAFIA".
03600			  default condition - pass surface to memory
03700			                      change topic or level
03800		Advantages - real-time performance, pragmatic adequacy and
03900			           effectiveness, performance measures.
04000			    "learning" by adding patterns
04100			    PARRY1 ignored word order- penalty too great
04200			    PARRY1 too sequential taking first pattern it found
04300			       rather than looking at whole input and then deciding.
04400			   PARRY1 had patterns strung out throughout procedures
04500			       and thus cumbersome for programmer to see what patterns were.
04600		Limitations - typical failures, possible remedies
04700	                        NO NOUNS, ETC.- NO MORE GENERAL RULES LIKE NOUN PHRASE, 
04800	                        AMBIGUITIES SLIDE THROUGH WHEREAS PARSER OWULD CATCH THEM
04900		Summary
05000	
05100	INTRODUCTION
05200	
05300		To recognize is to identify   something as an instance of the
05400	"same  again".   This  familiarity  is  possible because of recurrent
05500	characteristics of the world which repeat themsleves  over  and  over
05600	again.   We  shall  describe  an algorithm which recognizes recurrent
05700	characteristics of natural language dialogue  expressions  through  a
05800	multi-stage  sequence  of  functions  which  progressively transforms
05900	these input expressions into a pattern which eventually best  matches
06000	a  more abstract stored pattern. The name of the stored pattern has a
06100	pointer to the name of a response function which decides what  to  do
06200	once  the input has been characterized.         Here we shall discuss
06300	only the recognizing functions,  except  for  one  response  function
06400	(anaphoric     substitution)    which    interactively    aids    the
06500	characterization process.  How the response functions operate will be
06600	described in a future communication (Faught, Colby, Parkison).
06700		In  constructing  and  testing  a  simulation   of   paranoid
06800	processes,  we  were  faced  with the problem of reproducing paranoid
06900	linguistic behavior in a  diagnostic  psychiatric  interview.     The
07000	diagnosis   of  paranoid  states,  reactions  or  modes  is  made  by
07100	clinicians who judge a degree of  correspondence  between  what  they
07200	observe  linguistically in an interview and their conceptual model of
07300	paranoid behavior. There exists a high degree of agreement about this
07400	conceptual  model which relies mainly on what an interviewee says and
07500	how he says it.
07600		Natural language is a life-expressing  code  people  use  for
07700	communication  with  themselves  and others.  In a real-life dialogue
07800	such as a psychiatric interview,  the  participants  have  interests,
07900	intentions,  and  expectations which are revealed in their linguistic
08000	expressions.    To produce effects on an interviewer which  he  would
08100	judge  similar  to  the  effects  produced by a paranoid patient , an
08200	interactive  simulation  of  a  paranoid  patient  must  be  able  to
08300	demonstrate  typical  paranoid  interview  behavior.  To  achieve the
08400	desired effects, the paranoid model must have  the  ability  to  deal
08500	with the linguistic behavior of the interviewer.
08600		There are a number of approaches one might  consider  for  an
08700	ideal  handling  of dialogue expressions.       One approach would be
08800	to construct a dictionary of all  expressions  which  could  possibly
08900	arise  in an interview.  Associated with each expression would be its
09000	interpretation, depending on dialogue context,  which  in  turn  must
09100	somehow  be  defined. For obvious reasons, no one takes this approach
09200	seriously.     Instead  of  an  expression  dictionary,   one   might
09300	construct a word dictionary and then use projection rules to yield an
09400	interpretation of a sentence from the dictionary definitions.   This,
09500	has been the approach adopted by conventional linguistic parsers.
09600	Such  a method performs adequately as long as the dictionary involves
09700	only a few hundred words, each word having only one  or  two  senses,
09800	and the dialogue is limited to a mini-world  of  a  few  objects  and
09900	relations.    But the problems which arise in a psychiatric interview
10000	conducted in unrestricted English are too great for this method to be
10100	useful for  real-time  dialogues in which immediacy of response is an
10200	important requirement of efficiency.
10300		There is little consensus knowledge about how humans  process
10400	natural  language.   They  can  be shown to possess some knowledge of
10500	grammar rules but this does not entail that they  use  a  grammar  in
10600	interpreting  and  producing  language.   Irregular  verb-tenses  and
10700	noun-plurals do not follow rules; yet people use thousands  of  them.
10800	One   school   of  linguistics  believes  that  people  possess  full
10900	transformational grammars for processing language.  In our view  this
11000	position  seems  dubious.  Language  is  what  is  recognized but the
11100	processes  invloved  may  not  be  at  all   linguistic.   Originally
11200	transformationsl  grammars  were not designed to "understand" a large
11300	subset of English; they represented a  set  of  axioms  for  deciding
11400	whether  a  string  is  "grammatical".  Efforts to use them for other
11500	purposes have not been fruitful.
11600		An analysis of what one's problem actually  is  should  guide
11700	the  selection  or invention of methods appropriate to that problem's
11800	solution.  Our problem was not to develop a  consistent  and  general
11900	theory  of  language  nor  to  assert empirically testable hypotheses
12000	about how people process language.   Our  problem  was  to  write  an
12100	algorithm  which recognizes what is being said in a dialogue and what
12200	is being said about it in order to make a response such that a sample
12300	of I-O pairs from the paranoid model is judged similar to a sample of
12400	I-O  pairs  from  paranoid  patients.     From  the  perspective   of
12500	artificial  intelligence  as  an  attempt to get computers to perform
12600	human intellectual tasks, methods had to be devised for the  task  of
12700	participating  in a human dialogue in a paranoid-patient-like way. We
12800	are not making an existence claim that our  strategy  represents  the
12900	way  people  process  language.   We sought efficacious methods which
13000	could operate efficiently in real time. We would not  deny  that  our
13100	methods  for  this  task are possibly analogous to the methods humans
13200	use. And since our methods provide a general way of mapping from many
13300	"surface"  expressions  to a single stored pattern, these methods are
13400	useful for any type of dialogue algorithm.
13500		To perform  the  task  of  managing  communicative  uses  and
13600	effects  of  natural  language, we adopted a strategy of transforming
13700	the input until a pattern is achieved  which  matches  completely  or
13800	partially  a  more abstract stored pattern.  This strategy has proved
13900	adequate for our purposes a satisfactory percentage of the time.  (No
14000	one  expects an algorithm to be successful 100% of the time since not
14100	even humans, the best natural language systems around,  achieve  this
14200	level of performance).  The power of this method for natural language
14300	dialogues lies in its ability to  ignore  unrecognizable  expressions
14400	and  irrelevant  details.    A conventional parser doing word-by-word
14500	analysis fails when it cannot find one or more of the input words  in
14600	its dictionary. It is too fragile for a dialogue in that it must know
14700	every word; it cannot guess.
14800		In early versions of the paranoid model, (PARRY1), many  (but
14900	not all) of the pattern recognition mechanisms were weak because they
15000	allowed the elements of the pattern to be order  independent.     For
15100	example, consider the following expressions:
15200		(1) WHERE DO YOU WORK?
15300		(2) WHAT SORT OF WORK DO YOU DO? 
15400		(3) WHAT IS YOUR OCCUPATION? 
15500		(4) WHAT DO YOU DO FOR A LIVING? 
15600		(5) WHERE ARE YOU EMPLOYED? 
15700	In PARRY1 a procedure would scan these  expressions  looking  for  an
15800	information-bearing contentive such as "work", "for a  living",  etc.
15900	If  it  found  such  a contentive along with a "you" or "your" in the
16000	expression, regardless  of  word  order,  it  would  respond  to  the
16100	expression  as  if it were a question about the nature of one's work.
16200	(There  is  some  doubt  this  even  qualifies  as  a  pattern  since
16300	interrelations  between  words are ignored and only their presence is
16400	considered).    An insensitivity to word order has the advantage that
16500	lexical  items  representing  different parts of speech can represent
16600	the same concept,e.g.  "work" as noun or as verb. But a price is paid
16700	for  this  resilience  and elasticity. We found from experience that,
16800	since English relies heavily on word order to convey the  meaning  of
16900	it  messages,  the  penalty  of misunderstanding (to be distinguished
17000	from ununderdstanding), was too great.  Hence in PARRY2 , as will  be
17100	described shortly, all the patterns require a specified word order.
17200		It  seems  agreed  that  for  high-complexity  problems it is
17300	useful to have constraints.       Diagnostic  psychiatric  interviews
17400	(and  especially those conducted over teletypes) have several natural
17500	constraints.  First, clinicians are trained to ask certain  questions
17600	in  certain  ways. These stereotypes can be treated as special cases.
17700	Second, only  a  few  hundred  standard  topics  are  brought  up  by
17800	interviewers   who  are  trained  to  use  everyday  expressions  and
17900	especially those used by the patient himself. When the  interview  is
18000	conducted over teletypes, expressions tend to be shortened since the
18100	interviewer  tries to increase the information transmission rate over
18200	the slow channel of a teletype.  (It is said that  short  expressions
18300	are  more  grammatical  but  think  about  the phrase "Now now, there
18400	there.") Finally,  teletyped  interviews  represent  written  speech.
18500	Speech  is  known  to be 50% redundant; hence many unrecognized words
18600	can be ignored without losing the meaning  of  the  message.  Written
18700	speech  is  loaded  with idioms, cliches, pat phrases, etc.     - all
18800	being easy prey for a pattern-recognition approach.  It is futile  to
18900	try  to  decode  an idiom by analyzing the meanings of its individual
19000	words. One knows what an idiom means or one does not.
19100		We shall now describe the pattern recognition functions of the 
19200	algorithm.
19300	
19400			PREPROCESSING
19500	
19600		Each  word  in  the  input expression is first looked up in a
19700	dictionary of (1300) words which remains in core in machine language.
19800	The  dictionary consists of a list of words and names of word-classes
19900	they can be translated into. The size of the dictionary is determined
20000	by  the patterns, i.e.    each dictionary word appears in one or more
20100	patterns.  (ROGER- SHOW PIECE  OF  DICT?)  Words  in  the  dictionary
20200	reflect PARRY2's main interests. If a word in the input is not in the
20300	dictionary, it is dropped from the pattern being formed. Thus if  the
20400	input were:
20500		WHAT IS YOUR CURRENT OCCUPATION?
20600	and the word "current" is not in the dictionary, the pattern at  this
20700	stage becomes:
20800		(WHAT IS YOUR OCCUPATION)   
20900	The  question-mark  is  thrown  away as redundant since questions are
21000	recognized by word order. (A statement followed by  a  question  mark
21100	(YOU  GAMBLE?)  is considered to be communicatively equivalent in its
21200	effects  to  that  statement  followed  by   a   period.)   Synonymic
21300	translations  of  words  are  made  so  that the pattern becomes, for
21400	example:
21500		(WHAT  BE  YOU  JOB)   
21600	Groups  of  words  are  translated  into a single word-class name, so
21700	that, for example, "for a living" becomes "job".
21800		Misspellings  are  also  handled   in   the   dictionary   by
21900	simply rewriting a recognized  misspelling into its correct form.Thus
22000	"yyou" becomes "you". The common misspellings were gathered from over
22100	4000   interviews   with   versions  of  the  paranoid  model.  Other
22200	misspellings do not appear  in  the  pattern  because  they  are  not
22300	represented in the dictionary.
22400		Certain  juxtaposed  words  are  contracted  into  a   single
22500	word,e.g.   "GET ALONG WITH" becomes "GETALONGWITH". This is done (1)
22600	to deal with groups of  words  which  are  represented  as  a  single
22700	element  in  the stored pattern, and (2) to prevent segmentation from
22800	occurring at the wrong places, such as at  a  preposition  inside  an
22900	idiom.    Besides  these contractions, certain expansions are made so
23000	that for example, "DON'T" becomes  "DO  NOT"  and  "I'D"  becomes  "I
23100	WOULD".
23200			SEGMENTING
23300	
23400		Another weakness in the crude pattern matching of PARRY1  was
23500	that  it  took  the  entire  input expression as its basic processing
23600	unit. Hence if only two words were recognized in an eight word input,
23700	the  risk  of  misunderstanding was great. We needed a way of dealing
23800	with units shorter than the entire input expression.
23900		Expert  telegraphists  stay  six  to  twelve  words  behind a
24000	received message before transcribing it. (Bryan  and  Harter,  1897).
24100	Translators  wait  until  they have heard 4-6 words before they begin
24200	translating. Aided by a heuristic from  machine-translation  work  by
24300	Wilks  ( ), we devised a way of bracketing the pattern constructed up
24400	to this point into shorter segments using the list of words in Fig.1.
24500	The   new  pattern  formed  is  termed  either  "simple",  having  no
24600	delimiters within it, or "compound", i.e.being made up of two or more
24700	simple patterns. A simple pattern might be:
24800		( WHAT BE YOU JOB )
24900	whereas a compound pattern would be:
25000		(( WHY BE YOU ) ( IN HOSPITAL ))
25100	Our experience with this method of segmentation shows  that  compound
25200	patterns from psychiatric dialogues rarely consist of more than three
25300	or four fragments.
25400	       After certain verbs ("THINK", "FEEL",etc) a bracketing occurs
25500	to replace the commonly omitted "THAT", such that:
25600		( I THINK YOU BE AFRAID )
25700	becomes
25800		(( I THINK ) ( YOU BE AFRAID ))
25900	
26000			PREPARATION FOR MATCHING
26100	
26200		Conjunctions serve only as markers for the segmenter and they
26300	are dropped out after segmentation.
26400		Negations are  handled  by  extracting  the  "NOT"  from  the
26500	pattern and assigning a value to a global variable which indicates to
26600	the algorithm that the expression is negative in form. When a pattern
26700	is  finally matched, this variable is consulted. Some patterns have a
26800	pointer to a pattern of opposite meaning if  a  "NOT"  could  reverse
26900	their  meanings.     If this pointer is present and a "NOT" is found,
27000	then the pattern matched is replaced by its opposite, e.g.  (  I  not
27100	trust  you  )  is replaced by the pattern ( I mistrust you ). We have
27200	not yet observed the troublesome case of "he gave me not one but  two
27300	messages". (There is no need to scratch where it doesn't itch).
27400	
27500			MATCHING AND RECYCLING
27600	
27700		The algorithm now attempts to match  the  segmented  patterns
27800	with  stored  patterns  which  are  currently  about  2000 in number.
27900	First a complete and perfect match is sought.  When a match is found,
28000	the  stored  pattern  name  has  a  pointer to the name of a response
28100	function which decides what to do further.  If a match is not  found,
28200	further  transformations of the pattern are carried out and a "fuzzy"
28300	match is tried.
28400		For fuzzy matching at this stage, the elements in the pattern
28500	are dropped one at a time and a  match  attempted  each  time.   This
28600	allows ignoring familiar words in unfamiliar contexts.   For example,
28700	"well" is important in "Are you well?" but meaningless in  "Well  are
28800	you?".  
28900	 	Deleting one element at a time results  in,  for  example,the
29000	pattern:
29100			( what be you main problem )
29200	becoming successively:
29300			(a) ( be you main problem )
29400			(b) ( what you main problem )
29500			(c) ( what be main problem )
29600			(d) ( what be you problem )
29700			(e) ( what be you main )
29800	Since the stored pattern in this case matches (d), (e) would  not  be
29900	constructed.   We  found  it  unwise  to delete more than one element
30000	since our segmentation method usually yields  segments  containing  a
30100	small (1-4) number of words.
30200		Dropping  an  element  at  a  time  provides  a   probalility
30300	threshold  for  partial matching which is a function of the length of
30400	the segment. If a segment consists of five elements, four of the five
30500	must be present in a particular order (with the fifth element missing
30600	in any position) for a match to occur. If  a  segment  contains  four
30700	elements, three must match - and so forth.
30800	        The transformations  described above result in a  progressive
30900	
31000	coarsening of the patterns by deletion. Substitutions are  also  made
31100	in  certain  cases.  Some patterns contain pronouns which could stand
31200	for a number of different  things  of  importance  to   PARRY2.   The
31300	pattern:
31400		( DO YOU AVOID THEM )
31500	could  refer  to  the Mafia, or racetracks, or other patients.   When
31600	such a pattern is recognized, the pronoun is replaced by its  current
31700	anaphoric  value  as determined by the response functions, and a more
31800	specific pattern such as:
31900		( DO YOU AVOID MAFIA )
32000	is  looked  up.  In many cases, the meaning of a pattern containing a
32100	pronoun is clear without any substitution.  In the pattern:
32200		(( HOW DO THEY TREAT YOU ) ( IN HOSPITAL ))
32300	the meaning of THEY is clarified by ( IN HOSPITAL ).
32400	
32500			COMPOUND-PATTERN MATCH
32600	
32700		When more than one simple pattern is detected in the input, a
32800	second matching is attempted.  The methods used are  similar  to  the
32900	first  matching except they occur at the segment level rather than at
33000	the single element level. Certain patterns, such as ( HELLO ) and ( I
33100	THINK  ),  are dropped because they are considered meaningless.  If a
33200	complete match is not found, then simple patterns are dropped, one at
33300	a time, from the complex pattern. This allows the input,
33400		(( HOW DO YOU COME ) ( TO BE ) ( IN HOSPITAL ))
33500	to  match  the  stored  pattern,
33600		(( HOW  DO  YOU COME ) ( IN HOSPITAL )).
33700	
33800		If  no  match  can  be found at this point, the algorithm has
33900	arrived at a default condition and the appropriate response functions
34000	decide  what  to  do.  For example, in a default condition, the model
34100	may assume  control  of  the  interview,  asking  the  interviewer  a
34200	question, continuing with the topic under discussion or introducing a
34300	new topic.
34400	
34500		ADVANTAGES AND LIMITATIONS
34600	
34700		As   mentioned,   one   of   the   main   advantages   of   a
34800	characterization strategy is that it can ignore as irrelevant what it
34900	does  NOT  recognize.     There are several million words in English,
35000	each  possessing  one  to  one  hundred  senses.   To   construct   a
35100	machine-usable  word  dictionary  of  this  magnitude  is  out of the
35200	question at this time.    Recognition of natural language input  such
35300	as  described above, allows real-time interaction in a dialogue since
35400	it avoids becoming ensnarled  in  combinatorial  disambiguations  and
35500	long chains of inferencing which would slow a dialogue algorithm down
35600	to impracticality, if it could even function at all. The  price  paid
35700	for  pattern matching is that sometimes, but rarely, ambiguities slip
35800	through.
35900		A drawback to PARRY1 was that it reacted to the first pattern
36000	it  found  in the input rather than characterizing the input as fully
36100	as possible and then deciding what to do based on a number of  tests.
36200	Another   practical   difficulty  with  PARRY1  from  a  programmer's
36300	viewpoint, was that elements of  the  patterns  were  strung  out  in
36400	various  procedures  throughout  the  algorithm.    It  was  often  a
36500	considerable chore for the programmer to determine  whether  a  given
36600	pattern   was   present   and   precisely   where   it  was.  In  the
36700	above-described method, the patterns are all collected in one part of
36800	the data-base where they can easily be examined.
36900		Concentrating  all the patterns in the data base gives PARRY2
37000	a limited "learning" ability.  When  an  input  fails  to  match  any
37100	stored  pattern  or  matches  an  incorrect one, as judged by a human
37200	operator, a pattern matching the input can be put into the  data-base
37300	automatically.  If  the  new  pattern  has  the  same  meaning  as  a
37400	previously stored pattern, the human operator must provide  the  name
37500	of  the  appropriate  response  function.  If he doesn't remember the
37600	name, he may try to rephrase the input  in  a  form  recognizable  to
37700	PARRY2  and  it  will  name the response function associated with the
37800	rephrasing.  These mechanisms are not "learning" in the commonly used
37900	sense  but  they  do  allow  a  person to transfer his knowledge into
38000	PARRY2's data-base with very little redundant effort.
38100			PERFORMANCE
38200		We have a number of performance measures on  PARRY1  along  a
38300	number  of  dimensions including "linguistic non-comprehension". That
38400	is, judges estimated PARRY1's abilities along this dimension on a 0-9
38500	scale.   They  also  rated  human  patients and a "random" version of
38600	PARRY1 in this manner.( GIVE BAR-GRAPH HERE AND  DISCUSS).   We  have
38700	collected  ratings of PARRY2 along this dimension to determine if the
38800	characterization  process  represents  an  improvement  over  PARRY1.
38900	(FRANK AND KEN EXPERIMENT).